Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

نویسندگان

Christopher G. Atkeson

Jun Morimoto

چکیده

A longstanding goal of reinforcement learning is to develop nonparametric representations of policies and value functions that support rapid learning without suffering from interference or the curse of dimensionality. We have developed a trajectory-based approach, in which policies and value functions are represented nonparametrically along trajectories. These trajectories, policies, and value functions are updated as the value function becomes more accurate or as a model of the task is updated. We have applied this approach to periodic tasks such as hopping and walking, which required handling discount factors and discontinuities in the task dynamics, and using function approximation to represent value functions at discontinuities. We also describe extensions of the approach to make the policies more robust to modeling error and sensor noise.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Vision-Based and GPS-Signal-Independent Approach in Jamming Detection and UAV Absolute Positioning Assessment

The Unmanned Aerial Vehicles (UAV) positioning in the outdoor environment is usually done by the Global Positioning System (GPS). Due to the low power of the GPS signal at the earth surface, its performance disrupted in the contaminated environments with the jamming attacks. The UAV positioning and its accuracy using GPS will be degraded in the jamming attacks. A positioning error about tens of...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

A Game Theoretic Approach for Greening, Pricing, And Advertising Policies in A Green Supply Chain

In this paper, greening, pricing, and advertising policies in a supply chain will be examined with government intervention. The supply chain has two members. First, a manufacturer seeking to determine the wholesale price and the greening level and second, a retailer that has to determine the advertising cost and the retail price. The government is trying to encourage the manufacturer to green t...

متن کامل

Evaluation Approaches of Value at Risk for Tehran Stock Exchange

The purpose of this study is estimation of daily Value at Risk (VaR) for total index of Tehran Stock Exchange using parametric, nonparametric and semi-parametric approaches. Conditional and unconditional coverage backtesting are used for evaluating the accuracy of calculated VaR and also to compare the performance of mentioned approaches. In most cases, based on backtesting statistics Results, ...

متن کامل

Hilbert Space Embeddings of POMDPs

A nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes’ rule to these distribution embeddings. Policies and value functions are defin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2002

Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach

نویسندگان

چکیده

منابع مشابه

A New Vision-Based and GPS-Signal-Independent Approach in Jamming Detection and UAV Absolute Positioning Assessment

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

A Game Theoretic Approach for Greening, Pricing, And Advertising Policies in A Green Supply Chain

Evaluation Approaches of Value at Risk for Tehran Stock Exchange

Hilbert Space Embeddings of POMDPs

عنوان ژورنال:

اشتراک گذاری